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Abstract — This paper deals with rate distortion or source 
coding with fidelity criterion, in measure spaces, for a class 
of source distributions. The class of source distributions is 
described by a relative entropy constraint set between the true 
and a nominal distribution. The rate distortion problem for the 
class is thus formulated and solved using minimax strategies, 
which result in robust source coding with fidelity criterion. It is 
shown that minimax and maxmin strategies can be computed 
explicitly, and they are generalizations of the classical solution. 
Finally, for discrete memoryless uncertain sources, the rate 
distortion theorem is stated for the class omitting the derivations 
while the converse is derived. 

I. INTRODUCTION 

This paper is concerned with lossy data compression 
for a class of sources defined on the space of probability 
distributions on general alphabet spaces. In the classical rate 
distortion formulation with the fidelity decoding criterion, 
Shannon has shown that minimization of mutual information 
between finite alphabet source and reproduction sequences 
subject to fidelity criterion over the reproduction kernel 
has an operational meaning. Hence, it gives the minimum 
amount of information of representing a source symbol by a 
reproduction symbol with a pre-specified fidelity or distortion 
criterion. 

The classical rate distortion function for finite-alphabet 
and continuous sources has been studied thoroughly in the 
literature [1], [2], [3], [4] and [5]. A survey of the theory 
of rate distortion is given in [4]. The formulation of rate 
distortion function for abstract alphabets is investigated by 
Csiszar in [5]. Specifically, in [5] the question of exis- 
tence of solution in Polish spaces under some continuity 
assumptions on the distortion function and compactness of 
the reproduction space, is established under the topology 
of weak convergence. The formulation in [5] is based on 
two important assumptions, namely, 1) compactness of the 
reproduction space, 2) absolute continuity of all marginal 
distributions with respect to the optimal marginal distribu- 
tion. The compactness assumption is crucial in order to 
formulate the problem using countably additive measures, 
and to show existence of the minimizing measure using 
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tightness arguments and Prohorov's theorem [5]. Under these 
assumptions, the optimal solution is derived and it is given 

by 



q*{x,dy) 



J e Sp(x, Z )y* ^ 



(1) 



where p is the distortion function, q* is the optimal condi- 
tional distribution, v* is the optimal marginal distribution, 
A is the reproduction space, and s G 5ft is the Lagrange 
multiplier associated with the fidelity constraint. 

One of the fundamental issues for abstract alphabets, is 
whether the nonlinear equation in ([TJ has a solution. For the 
finite alphabet case, the existance of solution to dTJ follows 
from the Blahut algorithm [2], because in the limit the 
algorithm leads to an equation like ((T). For general, abstract 
spaces dTJ may not have solutions. Clearly, if (fTJ does not 
have a solution, then the minimizing measure exists but one 
cannot claim that it has the form given by Existence of 
a solution to the implicit nonlinear equation (JTJ, is proved 
using Tihonov Fixed Point theorem, which holds for locally 
convex topological vector spaces in [14]. 

Source coding theorems with fidelity criteria for abstract 
sources are discussed in many papers. For separable metric 
spaces results in this direction can be found in [6]. This 
result is applicable to the set up considered in this paper. 
Alternative approaches based on Large Deviation techniques 
are given in [8], while methods based on generalized AEP 
(asymptotic equipartition property) are given in [10]. A 
source coding theorem for stationary source is presented in 
[7]. 

In [15], Sakrison extended the operational meaning of the 
rate distortion function to a class of sources. According to 
[15], when the class of sources is restricted to a compact 
class the rate distortion function of the class is precisely 
equal to the maximization over the class of the classical 
rate distortion function. Moreover, Sakrison's rate distortion 
function is calculated in [19] for finite alphabet class of 
sources. Related subsequent work is also found in [17], [18]. 

This paper is concerned with the rate distortion or source 
coding problem with fidelity criterion on general abstract 
spaces, for a class of source distributions. The class of source 
distributions p! is modeled by a relative entropy H (• 1 1 •), such 
that H(p'\\p) < R, R > 0, where R is the distance from the 
so-called nominal source distribution p. The rate distortion 
for this class is formulated using minimax and maxmin 
strategies, with pay-off the mutual information between the 
source and reconstruction symbols, in which the minimum 
is with respect to the reconstruction conditional distribution 



(stochastic kernel), and the maximum is with respect to the 
source distribution p! which satisfies H(p'\\p) < R. 

Clearly, a class of source distributions defined by 
Mr(h) = {p' G Mi(A); H(p'\\p) < R}, R > 0, 
(A4i(A) the set of probability distributions on A) is appeal- 
ing since it is often used as a measure of distance between 
distributions, and R2 > Ri implies A4r 1 (p) C A4r 2 (p). 

The objective is to compute both minimax and maxmin 
rate distortion functions for the class Mr(p) and to show 
operational meaning of the minimax rate distortion function 
by deriving a source coding theorem and its converse for this 
class of sources. The minimax and maxmin rate distortion 
functions are computed explicitly deriving expressions for 
the reproduction kernel which is a variant of (H). Moreover, 
from the solution it follows that both minimax and maxmin 
rate distortion yields the same answer. Due to space limita- 
tion the derivation of the source coding theorem is omitted 
and only the converse is presented. 

II. PROBLEM FORMULATION 

Assume (A, A) and (A, A) are two measurable spaces, 
where A is the source space and A is the reproduction space. 
Assume q : A x A — > [0, 1] is a mapping with the following 
two properties: 

1) For every x G A, the set function q(x, .) is a probability 
measure on A. 

2) For every F G A, the function q(.,F) is .A-measurable. 
Mappings which satisfy 1) and 2) are called stochastic 
kernels. Let Q(A, A) denote the class of all such stochastic 
kernels. 

Given any measurable space (E,S), let A^i(S) denote the 
space of probability measures on E. 

Let p G M\{A) be the source probability. For a given 
pair {q G Q(A, A), p G Aii(A)} we can define three other 
probability measures as follows: 

PI) The joint probability measure P G M\{A x A) given 
by 

P{G) = {p®q){G) = [ q{x 7 G x )p(dx), VGeAxA 

J A 

where G x is the section of G at point x, defined by G x = 

{y G A : (x, y) G G} and ® denotes convolution. 

P2) The marginal probability measure v G M.\ (A) given by 

v{F) = P{A xF) =j A q(x, (A x F) x )p(dx) 
= J A q(x,F)p(dx), VFeA 

P3) The product measure 7r : A x A -> [0, 1] of p G M\{A) 
and v G M\{A) 



tt(G) = (/i x v)(G) = / v(G x )p(dx), VGeAxA 

J A 

Let p : A x A — > [0, 00) be a A x ^.-measurable function, 
and for each D G [0, 00), define the set Q(D) as 

Q(D) = {q: Ax A ^[0,1]} q(x,A) = l; 

p(x,y)q(x,dy)p(dx) < D} 



where each q G Q(D) is ^4-measurable for any F G A and 
q(x,A) — 1 for any x G A. For a given P G M\{A x A) 
and p G Aii(A) we assume that Q(D) is non empty. 
Given a fixed source measure p G .Mi (A) the rate distortion 
function is defined as follows 

R(D) = inf H(P\\tt)= inf I(p;q) (2) 

qeQ(D) q&Q{D) 

where _ff(P||7r) is the relative entropy between P and tt and 
is denoted by I(p; q). More explicitly, by using 

P(dx x dy) = p{dx) ® q(x, dy) 
ir(dx x dy) = p(dx) ® v(dy) 

R(D) is given by 



R(D) = inf I(p; q) 



q(x,dy) 
v{dy) 

where for every q in Q(D) we have 



inf 

geQ(D) J a J A 



IJM 



q(x,dy)p(dx) (3) 



p(x,y)P(dx,dy) < D 



or 



Ax A 



p{x,y)q{x,dy)p(dx) < D 

A J A 

III. RATE DISTORTION FOR A FIXED SOURCE 

Throughout the rest of the paper we assume that both 
A and A are polish spaces (complete, separable metric 
spaces) and so normal topological spaces. The following 
theorem found in [14] is a generalization of [5] relaxing the 
assumptions of compactness and absolute continuity while 
identifying appropriate function spaces in which the solution 
is sought. 

Theorem 3.1: Let A, A be two polish spaces at p : 
A x A — > [0, 00] a measurable, nonnegative extended real- 
valued function, continuous in the second argument, and p G 
Mi{T,) be fixed. Then 
1) 

R(D) = inf I(p; q) 



A 



has a solution. 

2) Suppose the set F = {(x,y) G A x A;p{x,y) < D] is 
non-empty. Then the constraint problem R(D) is equivalent 
to the unconstraint problem. 

R{D) = max inf {I(p; q) 

qeQ(D) 

Further, the infimum occurs on the boundary of the set Q(D) 
and the infimum is attained at 



q*(x,F) 



A J A 



$ A e s P( x >v)v*{dy)' 



s < 



The maximization over s < denoted by s* is found 
from the constraint which is satisfied with equality. The 
corresponding rate distortion function has the following form 

R{D) =s*D- J log ( / e e ' p(x ' v) v*{dy))ii{dx) 

Note that the solution presented in Theorem 13. II is one form 
of the rate distortion solution. Alternative expressions are 
found in [1]. The main objective of this paper is to extend 
the results of Theorem 1 3. II to a class of sources described by 
a relative entropy constraint set, and to show source coding 
theorem and its converse. 

IV. RATE DISTORTION FOR A CLASS OF 
SOURCES 

Let /i G M.x(A) denote the nominal (fixed) probability 
measure which is not the true source probability measure. 
Further, assume the true source probability measure belongs 
to the following relative entropy constraint set 

M R (ti) = {// e M^A); H(fx'\\(i) < B,} 

where R > is given and fixed, in [0, oo). Clearly the larger 
R is the larger the class of distributions allowed in the set. 
In the absense of uncertainty, the set A4r(h) reduces to the 
singleton {/i}. For a given q G Q(D) and a given p G 
Mi(A) let P 1 e Mi (A x A) denote the joint probability 
measure defined by 



P'(G) = (/j,'®q)(G) = / q(x,G x )/j,'(dx), VG G A x A 

J A 

Also define the marginal probability measure v' G M. (A) by 

v'{F) = P'(A xF) = f q(x, (A x F) x )p'(dx) 

q{x,F)n'{dx), VF e A 

Denote the product of p! G M\{A) and v 1 G M\(A) by 7r', 
defined by 

7r'(G) = (// x v)(G) = { v(G x )n'(dx), VGeixi 

J A 

Let p : A x A — > [0, oo) be a A x ^.-measurable function, 
and for each D G [0, oo), define the set Q(D) as 

Q(D) ={q: Ax A^ [0,1]; q(x,A) = l; 

p(x,y)q(x,dy)p'(dx) < D, V// G M R (p)} 



A J A 

where each q G Q(D) is ^.-measurable for any F G A 
and q(x,A) = 1 for any x G A, and D G [0, oo) and 
p : A x A — > [0, oo), is a non-negative measurable function 
with respect to the measurable space Ax A. 
Given the class M. R (p) of uncertain source probability 
measures the Rate Distortion for the class of M. R {p) is 
defined by 

R+{D)= inf sup 10*'; q) (4) 



Note that in the minimax formulation of Rate Distortion 
the uncertainty // G M r(h) tries to maximize the rate of 
reconstructing the source while the designer q G Q(D) tries 
to minimize the rate. Thus, R + (D) is the rate distortion of 
the class Mr(p). 

An alternative formulation is to consider the maxmin Rate 
Distortion 



R-(D)= sup inf I((J,';q) 



(5) 



It can be shown that R + {D) > i?_(_D) while equality 
holds if the minisup Theorem 16.11 (see Appendix) holds. It 
can be easily shown that by formulating R + (D), R_(D) 
using countably additive probability measures and weak 
convergence as in [5] or regular bounded finitely additive 
probability measures and weak* convergence as in [14], that 
the conditions of minisup theorem, Theorem 16.11 (Appendix) 
hold. Hence, R+{D) = R-(D). Nevertheless, in the next 
two Theorems we find the minimax and maxmin strategies 
and then verify using these strategies that R + (D) = i?_(£)). 
Once these strategies are obtained and R + (D) = R_{D) is 
established, then the solution of i?_ (D) is used to prove the 
coding theorem. 

Below we provide the solutions to R+(D) and i?_(D). 

Theorem 4.1: Suppose e~ G £i(/i) and te~ G L\(p), 
where £(x) = - log ( J A e sp ^ x ^v*(dy)J , A > 0. Then 
the infimum and supremum of i?_(D) are attained by the 
following distributions: 



p*(dx) 



( 



)u(dy) 



pi{dx) 



Ia{ 



A > 



j A e s P("<!<) v* (dy) 



fi(du) 



e s ^-y^*(dy) 

where s < 0, A > are found from the constraints. 
The rate distribution R-(D) is given by 

R-(D) =sD + Ai? + Alog J ( / \e spix > y) v*(dy)}~ pi{dx) 

Proof. See Appendix. 

The following Lemma is needed to be able to apply 
Theorem 16.21 to find the solution of R + (D). 

Lemma 4.2: Assume e sp G L\(v), s < then £ define by 



£(x) = j log I e - sp( - x ' y) v 



- MqiX j^y X ,dy), s<0 



is bounded below. 
Proof. Omitted. 

Using Lemma 14.21 and applying Theorem 16.21 (see Ap- 
pendix) similar to Theorem 14.11 we deduce the solution of 
R + (D). 

Theorem 4.3: Suppose e sp G L\(v), s < then the 
supremum and infimum of R + (D) are attained by the 



following distributions: 



dp* = 



e^dp 
S A e^dp 



A > 



q*{x,dy) 
where t is defined by 

£(x) = Jlogle 



v*{dy) 



J A e s P( x > z )v*(dz) 



, s<0 



v*{dy) r y y> 



and s < 0, A > are found from the constraints. 
The rate distortion R + (D) is given by 

R+(D) =sD + \R + Xlog J (J e sp{x > v) v*{dy)^~ p{dx) 

Proof. Follows as in Theorem 14. II 

Lemma 4.4: For any distribution p! in the set A4r(p), we 
have 

#(m'IIm*) < R* 

where p* is the source distribution found in Theorem 14.11 
with v replaced by v* and R* is given by 

R* = log (^J ( J e sp ^u*{dy)^ * p(dx^j + R 

+A^ai? + alog( J ( J^e sp ^v*{dy)Y p(dx)jj 

where A, a G 3? are constants. 
Proof. See Appendix. 

Next we state the rate distortion theorem for uncertain 
discrete memoryless sources with distributions in M.r(p). 
The derivation follows the same steps as in [1]. 

Theorem 4.5: Robust source coding theorem Let the set 
of discrete memoryless sources {X, p'} with H(p'\\p) < R, 
and single letter fidelity criterion be given. Let R* (D) denote 
the robust rate distortion function defined in Theorem 14.1 l or 
Theorem 14.31 Then given any e > and any D > 0, an 
integer n and a source code with block length n, and rate 
1Z < R*(D) + e exists, such that for any distribution from 
the set Ain(p), the code is D + e-admissible. 
Proof. The basic idea for the proof is to construct the code 
based on q* and v* , found in the robust rate distortion formu- 
lation ( see Theorem |4T). However, due to the uncertainty in 
the source distribution, all the averages are taken with respect 
to fi' and q* . Then these averages are related to averages 
which appear in the rate distortion theorem for fx* and q*, 
which are the solutions to the robust rate distortion problem. 
The proof is based on a random coding argument. 

Theorem 4.6: Converse to the robust source coding 
theorem Every code which is D-admissible for the whole 
class of source distributions Mr(h), has rate greater than 
R*(D), i.e., 



where K(n, D) is the number of D-admissible codewords 
of length n, in the code. 

Proof. Suppose code C with rate 1Z = — log K(n, D) is 
inadmissible for the whole class of source distributions 
M.r{h), Then by the converse source coding theorem for 
a fixed source distribution p! from the set, we have 



1 



-\ogK{n,D) >R^(D) 



(6) 



where (D) is the rate distortion function for the source 
with distribution p! . Since our code is inadmissible for any 
fx' E Mr(p), then (© must hold for all // 6 A4r(/i). 

i log K(n, D) > R^i (D), V e M R (n) 
Taking supremum of both sides with respect to p! leads to 

- log K{n,D) > sup R^{D) 

By Theorem |4~T1 we have i log K{n, D) > R*(D). 

V. CONCLUSION 

The problem of rate distortion is extended to the case of 
uncertain sources, in which the uncertainty description about 
the true source distribution is described by a relative entropy 
constraint set between the true and a nominal distribution. 
The rate distortion problem is thus formulated and solved 
using minimax strategies, which results in robust source 
coding with fidelity criterion. The solution is found for 
both minimax and maxmin strategies. Finally, for discrete 
memoryless uncertain sources, the rate distortion theorem is 
stated and its converse is proved. 

VI. APPENDIX 

The next minisup Theorem states necessary conditions for 
R + {D)=R_{D). 

Theorem 6.1: Minisup Theorem [16] Let f(x,y) be de- 
fined for x £ X, y € y, where X and y are convex subsets 
of topological vector spaces and X is compact, f(x,y) be 
convex and lower semicontinuous in x G X for each y 6 y 
and concave in y G y for each x G X. 
Then there exists an x* G X such that 

sup mm f(x,y) = sup f(x*,y) = mm sup f(x, y) 

ye yxeX yG y xeX ye y 

The next theorem gives the duality between relative en- 
tropy and free energy. 

Theorem 6.2: [13] For every I : S — > 5ft measurable 
function bounded below and p G A^i(S). Then 

sup { / i{x)v[dx) - H{v\\p)} 

{i>GvVfi(E);i/HlM)<°°} JS 



= log / e l{x) p{dx) 



1 



Moreover, if le l G M.\(p) then the supremum is attained by 
the tilted v G M\{Tj) given by 

e e ^p(dx) 



logK(n,D) > R*{D) 



v*{dx) 



Je^pidx) 



Proof of Theorem 14.11 Now 4^ can found from Theorem 14.11 so we have 



By Theorem 13. II we already know that 

R^(D)= inf I(jj,';q) 
geQ(D) 



ff( M '||/i*) = ff(/i'||/i) 



^{sD- / logH e sp{x ' y) v*(dy))n'(dx)} \ [f A e°»(°>.v)v*(dy)) J 



max 

s <° JA ^JA 

Now consider the function £(x) = = H(fj,'\\fj,) + log ( J A (f A e s P^v*{dy)^ K dz )\ 

— \og(^f A e sp ( x,y ^i>*(dy)j. From previous results, we ^ ' 

know that s < 0, hence £{x) > 0, and therefore £{x) +^J A lo S \Ia e sp( - x ' v) v*(dyf) fi,'(dx) (8) 

is a bounded-below measurable function defined on the Tj7T1 

( \ a\ tvt i .u i-i In (18b, we can find the supremum of the right-hand side term 

measurable space (A, A). Now we solve the problem r ° 

R_(D), using Lagrange multilpiers. J= ^ f log f f ^y)^^^ 

R-(D)= sup iV(£>) »'eM R J A 

ti'eM R ([i) Yhe function inside the first integral is measurable and 

= min sup max (r^ (D) - A(ff(/z'||/z) - i?)) bounded below by zero. Let 



= mm sup max 
-sD 



Then 



J= sup / £dfi' = aR + a log ( / e~dfi) (9) 
Now we use the duality between relative entropy and free- h'^MrJa Wa ' 



energy as explained in Theorem 16.21 to find the above , . . . , TT t i *w \ n r , 

aj T „ e t / \ , » e — ~T / x , „ ■ , where a is chosen in a way that H(a ' it) = R for the 
supremum. If e~ e £i(/i) and £e~ G Li(^i), A > then the • 1 



supremum is attained for fi* given by 

fj,*(dx) 



following measure 



I A e A r f -, 1 «(d u ) Combine © and © to get 

Hence i?^'] < lo s(J (J, e sp{x ' y) v* (dy) 



-x 



R 



R-(D)= sup R^{D) 



min max j sTJ + Ai? + A log ( / e x du)\ (7) 
A>0 s<o I \Ja ' ' 



A(ai? + alog( I ^J e sp( - x '^u*(dy)Y(J,{dx)^ (10) 



, , Also the supremum in (OB is achieved for u ' , and for this 
where min over A > denoted by A is chosen such that , TTI . . ,-, . , . , , , 

TT , ... , , ~ , ^ „ , ,, 4 . measure we have ii(u ' u) = it. Therefore the right hand 

H(M II/*) |a=A» = = iZ, ; and max over s<0 denoted by s is side rf ^ . g achieved for ad fin 

chosen such that the distortion constraint holds with equality. 

Now ©, can be written as H(fi''*\\fi*) = R* 

R-(D)= sup R^{D) = s*D + X*R+ where 

+A*log ^ ( l_e s *"^v*{dy))^ii{dx) R* = log ^f A (^e^*)^^))"^ 



Also by Theorem 13. II the reproduction kernel q* for the rate 
distortion problem defined for the source /i* is given by 

e «* ' Pi x >v)p*(dy) 

q * M) = he^Mdz) References 
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